SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

نویسندگان

  • Sebastian Will
  • Christina Schmiedl
  • Milad Miladi
  • Mathias Möhl
  • Rolf Backofen
چکیده

MOTIVATION RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). RESULTS Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sequence analysis SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of OðnÞ. Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optim...

متن کامل

Publications of Sebastian Will

[2] Sebastian Will 1 and Hosna Jabbari. Sparse RNA folding revisited: space-efficient minimum free energy structure prediction. quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics. Local exact pattern matching for non-fixed RNA structures. Structure-based whole genome realignment reveals many novel non-coding RNAs. CRISPRmap: an automated classification o...

متن کامل

Publications of Sebastian Will Journal Articles

quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics. Local exact pattern matching for non-fixed RNA structures. Structure-based whole genome realignment reveals many novel non-coding RNAs. CRISPRmap: an automated classification of repeat conservation in prokaryotic adaptive immune systems. Incorporating thermodynamic stability in sequence and structure-ba...

متن کامل

A max-margin model for efficient simultaneous alignment and folding of RNA sequences

MOTIVATION The need for accurate and efficient tools for computational RNA structure analysis has become increasingly apparent over the last several years: RNA folding algorithms underlie numerous applications in bioinformatics, ranging from microarray probe selection to de novo non-coding RNA gene prediction. In this work, we present RAF (RNA Alignment and Folding), an efficient algorithm for ...

متن کامل

Simultaneous Alignment and Folding of Protein Sequences

Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We present partiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm's complexity is polynomial in time and space. Algorithmically, partiFold-A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 31  شماره 

صفحات  -

تاریخ انتشار 2013